Multi-modal Interactive Video Retrieval with Temporal Queries
نویسندگان
چکیده
This paper presents the version of vitrivr participating at Video Browser Showdown (VBS) 2022. already supports a wide range query modalities, such as color and semantic sketches, OCR, ASR text embedding. In this paper, we briefly introduce system, then describe our new approach to queries specifying temporal context, ideas for color-based sketches in competitive retrieval setting novel pose-based queries.
منابع مشابه
Multi-modal query expansion for video object instances retrieval
In this paper we tackle the issue of object instances retrieval in video repositories using minimum information from the user (e.g., textual description/tags). Starting for a set of tags, images containing the object of interest are crawled from popular image search engines and repositories (e.g., Bing, Fickr, Google) and the positive and most representative instances of the object are automati...
متن کاملMulti-modal Classifier Fusion for Video Shot Content Retrieval
In this paper we present a new chromosome to solve the problem of classifier fusion using genetic algorithm. Experiments are conducted in the context of TRECVID. In particular we focus on the feature extraction task that consists in retrieving video shots expressing one of predefined semantic concepts. Three modalities (visual, textual and motion) and two features per modality are used to descr...
متن کاملInteractive Multi-Modal Robot Programming
As robots enter the human environment and come in contact with inexperienced users, they need to be able to interact with users in a multi-modal fashion—keyboard and mouse are no longer acceptable as the only input modalities. This paper introduces a novel approach to program a robot interactively through a multi-modal interface. The key characteristic of this approach is that the user can prov...
متن کاملMulti-modal Medical Image Retrieval
Images are ubiquitous in biomedicine and the image viewers play a central role in many aspects of modern health care. Tremendous amounts of medical image data are captured and recorded in digital format during the daily clinical practice, medical research, and education (in 2009, over 117,000 images per day in the Geneva radiology department alone). Facing such an unprecedented volume of image ...
متن کاملMulti-Modal Fashion Product Retrieval
Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-98355-0_44